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Abstract 

We consider the issue of constructing PRESS statistics and coefficients of prediction 
for a class of beta regression models. We aim at displaying measures of predictive 
power of the model regardless goodness-of-fit. Monte Carlo simulation results on the 
finite sample behavior of such measures are provided. We also present an application 
that relates to the distribution of natural gas for home usage in Sao Paulo, Brazil. 
Faced with the economic risk of to overestimate or to underestimate the distribution 
of gas was necessary to construct prediction limits using beta regression models 
(Espinheira et ah, 2014). Thus, it arises the aim of this work, the selection of best 
predictive model to construct best prediction limits. 
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1 Introduction 


The beta distribution is commonly used to model random variables that assume values 
in (0,1), such as percentages, rates and proportions. The beta density can display quite 
different shapes depending on the parameter values. Oftentimes the variab le of in terest 
is related to a set of independent (explanatory) variables. Ferra ri and Cribari-Neto (20041 ) 
introduced a regression model in which the response is beta-distributed, its mean be¬ 
ing related to a linear predictor through a link function. The linear predictor includes 
independent variables and regression parameters. Their model also includes a precision 
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parameter whose reciprocal can be viewed as a dispersion measure. In the standard for¬ 
mulation of the beta regression model it is assumed that the precision is constant across 
o bservation s. H oweve r, in many practical situations this assumption does not hold. 
Smith son and Verkuil en (2 006 1) consider a beta regression specification in which disper¬ 
sion is not constant, but is a function of covariates and unknown parameters. Parameter 
estimation is carried out by maximum likelihood (ML) and standard asymptotic hy¬ 
pothesis testing can be easily performed. Practitioners can use the betareg package, 
which is available for the R statis tical software (htt p://www.r-project.org:), for fit¬ 
ting beta regressions. [Cribar i -Neto and Zeileis (20101 ) provide an overview of varying 
dispersion beta regression modeling using the betareg package. 


Recently lEspinheira et al. (20141 ) built and evaluated bootstrap-based prediction in¬ 
tervals for the class of beta regression models with varying dispersion. However, a 
prior approach it is necessary, namely: the selection of the model with the best predic¬ 
tive ability, regardless of the goodness-of-ht. Indeed, the model selection is a crucial 
step in data a nalysis, since all inferential performance is based on the selected model. 
Ba ye r a nd Cribari-Neto ( 2014 ) evaluated the performance of different selection criteria 
models in samples of finite size in beta regression model, such as Akaike Information 
Criterion (AIC) ( Akaike. 1973 b Schwarz Bayesian Criterion (SBC) ( Schwarz. 1978 b 
residual sum of squares (RSS), and various functions of RSS such as the coefficient 
of determination, R 2 and the adjusted R 2 . However, these metho ds do not offer any 
insight about the quality of the predictive values. In this context, Allen (197 4). pro¬ 
posed the PRESS (Predictive Residual Sum of Squares) criterion, that can be used as 
an indication of the predictive power of a model. The PRESS statistic is independent 
from the goodness-of-fit of the model, since that its calculation is made by leaving 
out the observations that the model is trying to predict. The PRESS statistics can be 
vie wed as a sum of squares of external residuals. Thus, similarly of the approach of R 2 
Mediavilla et ah f 2008h proposed a coefficient of prediction based on PRESS namely 
P 2 . The P 2 statistic can be used to select models from a predictive perspective adding 
important information about the predictive ability of the model in various scenarios. 


2 On beta regression residuals 


Let yi,...,y n be independent random variables such that each y t , for t — 1,..., n, is 
beta distributed, i.e., each y t has density function given by 


/{yti l^t-i (ftt) 


m) 


rWt)r((i -fk)(/>t) 


Vt 


0 < y t < 1, (1) 


where 0 < /x t < 1 and (j) t > 0. Here, E (y t ) = y, t and Var (y t ) = V(u+) /( 1 + <&), where 
V{nt) = Ah(l— lit )-In the beta regression model introduced by Ferrari a nd Cribari-Neto (2004!) 
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the mean of yt can be written as 


g(vt) = xl(3 = T) t . ( 2 ) 

In addition to the relation given in fl2lh it is possible to assume that the precision 
parameter is not constant and write 

h ((f*t) = z] 7 = i? t . (3) 

In ([2]) and (J3J), rjt and are linear predictors, j3 — (j3i, ..., (3k) T and 7 = ( 7 1; ..., 7 g ) T 
are unknown parameter vectors (/3 G 7 G BP), x t i, , x t k and z t i, ..., z tq are fixed 
covariates (k + q < n) and g(-) and h(-) arc link functions, which are strictly increasing 
and twice-differentiable. 


The PRESS statistic is based on sum of external residualsjrbtained from exclusion of 
observations. For beta regression models Ferrari et al. (2C)Tlt ) present a standardized 
residual obtained using Fisher’s scoring iterative algorithm for /3 under varying dis¬ 
persion. Here, we propose a new residual based on a combination of ordinary residuals 
obtained using the algorithms for /3 and 7 under varying dispersion. At the outset, con¬ 
sider the Fisher’s scoring iterative algorithm for estimating j3 (see the Appendix lAl). 
From (lA.Sp it follows that the mth step of the scoring scheme is 


_ p{m) + rx T ^ m ^W ( ' m ^Xy 1 ^ m '>X T T^ m \y* - 


(4) 


where the tth elements of the vectors y* and p* are given, respectively, by 

V*t = l°g{W(l - Vt)} and y* t = i){ix t (j) t ) - -0((1 - Ah)0t), (5) 

'ijj(-) denoting the digamma function, i.e., tp(u) = dlogr(-u)/dn for u > 0. The matrices 
T and W are given in (I A. 1 (1 and (IA.3I) . respectively, A" is an n x k matrix whose tth 
row is x[ and $ = diag(</>i,..., </> n ). Note that p* = E(y*) (see flA. 6 |) : Appendix IA1) . 
Similarly, from (1A. 5[1 it follows that the mth step of the scoring scheme for 7 is given 
by 

7 (m+ 1) = 7 (m) + (Z T D {m) Z)- 1 Z T H^ m) a {m \ ( 6 ) 

where the tth element of at is give by 

at = yt(yt - y* t ) + iog(i - yt) - ^((1 - tp)</>t) + V’Ot) (7) 

and the matrices H and D are given in flA.2p and ( 1A.4I) . respectively, and Z is an 
n x q matrix that tth row is zj. It is possible to write the iterative schemes in 
(HD and m i n terms of weighted least squares regressions, respectivclly as /3^ m+1 ^ = 
(X T $W|pWx)- 1 fHx T iyH u W an d 7 (™+P = (Z T D^Z)~ 1 Z T ■ Where 
+ W~ l ^ n) T^ m \y* — with 7 = ( 7 !,... , 7 „,) T = X(3, u + 
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D 1 -"'' I //W fl ( ra ) ] with ft = (t?i, ... , tU ) 7 = Z 7 and a t given in (J7J). Upon convergence, 
P = (X T $WX)~ 1 6x T Wu 1 and 7 = (Z T D Z)- 1 Z T Du 2 , where 

—1 ~ —1 ^ ( 8 ) 

Ui — fj + W T(y*—j2*) and u 2 — 'd + D Ha. 


Here, W, T, H and D are the matrices W, T, H and D respectively, evaluated at 
the maximum likelihood estimator. We note that /3 and 7 in (JHJ) can be viewed as 

the least squares estimates of [3 and 7 obtained by regressing $ l / 2 W u\ and D u 2 

^ ^—1/2 _1 ^2 

on Q l / 2 W X and D Z, respectively. The residuals ordinary obtained of interactive 
process of (3 and 7 are given by = $ 1 / 2 fU (ui — rf) — $ W T(y* — JT) 
and r 7 = D (u 2 — d) = D Ha , respectively. Hence, using the definitions of the 


matrices given from (1A.1I) to ( 1A.5I) . we can rewrite the residuals obtained from the 
iterative process of (3 and 7 respectively, as 


r 0 Vt 

' t ~ 


Vt 


and 


7 

77 = 


a t 




(9) 


where v t and <7 are given in (1A.3I) and (1A.4I) . respectively. Thus, we propose a new 
residual based on r i5 and r 7 , which we shall refer to as the combined residual rf 7 = 
(y* — ji *) + a t where y* and //* are given in (J5]). Assuming that y t and (j) t are known 
and from (1A.6|) to (1A.10D it follows that Var(rf 7 ) = ( tl with 


Ci = (1 + - iH)<h) - X’X’h)' 


( 10 ) 


Then, we can define the following standardized combined residual: 



(y* ~ fit ) + 

\/Z 


( 11 ) 


Here, Q is C t in (ITU]) evaluated at j2 t e 4> t . It is important to note that when 0 is constant 
it is only necessary replace 0 t by 0 at all elements of (ITT]) . We should emphasize that 
here we are just interested in evaluating the r ^ 7 in the composition of the PRESS 
statistic. 


3 P 2 Statistics 


Consider the linear model, Y = X (3 + s where Y is a vector n x 1 of responses, X is a 
known matrix of covariates of dimension n x p, [3 is the parameter vector of dimension 
pxl and £ is a vector n x 1 of errors distributed as N n (Q] a 2 I n ). Let /3 = (X T X)~ 1 X T y, 
e t — Vt~ x JP, V — X J/3 and let (3^ be the estimate of /3 without the i th observation and 
y(p = xJ 0 (p be the case deleted predicted value of the response when the independent 
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variable has value x t . Thus, for multiple regression PRESS = J2t=i(Vt — V{t)) 2 which 
can be rewritten as PRESS = J2t=i(Vt ~ Vt) 2 /{ 1 — h-tt) 2 , where h u is the tth diagonal 
element of the matrix X(X T X)~ x X T . 

In the beta regression model f3 in (jSJ) can be viewed as the least squares estimate of /3 
obtained by regressing 


y = $ 1/2 W 1/2 Ul on X = $] /2 W 1/2 X. 


( 12 ) 


Thus, the prediction error is y t — yu) = ^ 2 w\ 
proposed by (jPregibon, 1 9811 ) and fact that 


ff 2 wl /2 xj^ t) . Using the ideas 


At) = $ 


(X T ^WX)- 1 xtff 2 w} / 2 r^ 
(1 - h* u ) 


where rf is given in d^J) and h* t is the tth diagonal element of 

H* = (W$) 1/2 X(X$WX)- 1 X T ($W) 1/2 

it then follows that y t — y^ t ) = {rf}/( 1 — h* tt ). Finally, for the beta regression model 
the PRESS statistic is given by 


n n ( y \ 

press= rV - (13) 

In (jT3l) the tth observation is not used in fitting the regression model to predict yt , then 
both the external predicted values y^ t ) and the external residuals ep) are independent 
of y t . This fact enables the PRESS statistic to be a true assessment of the prediction 
capabilities of the regression model regardless of the overall quality of the fit of the 
model. 


Considering the same approach of the coefficient of determination R 2 , we can think in 
a prediction coefficient based on PRESS, namely 


PRESS 

SST(t) ’ 


(14) 


wherein SST (q = Y%=i{yt ~~ V{t)) 2 and y^ is the arithmetic average of the y ^, t = 
It can be shown that SST( t ) = (n/n — p) 2 SST, wherein p is the number 
of model parameters. In the beta regression model with varying dispersion, SST = 
J2t=i(llt ~ y) 2 -, y is the is the arithmetic average of the y t = t — 1,... ,n 

given in (fT21) and p = k + q. 


Cook and Weisberg (1983) suggest other versions of PRESS statistics based on differ¬ 


ent residuals. Thus, we present another version of PRESS statistics and P 2 associated 
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considering a new residual presented in (ITT]) , such that 


PRESS M = Y 


r hi 

r P ,t 


t= 1 


i - m 


and 


P 2 — 1 _ 

"P7 - 1 


PRESS ; 


hi 


SSTi 


(d 


(15) 


respectively. It is noteworthy that the measures A 2 and P 2 are distinct, since that the 
P 2 propose to measure the quality of fit of the model and the P 2 and P| measure 
the predictive power. Additionally, P 2 and P| are not positive measure. In fact, the 
PRESS/SST/^ is a positive quantity, thus the P 2 and the Pf 7 associated given in 
m and (I15p . respectively, take values in (—oo; 1]. The closer to one the better is the 
predictive power of the model. In order to check the goodness-of-fit of the estimated 
model, we used the approach suggested by Baver and Cribari-Neto 12014 1 for beta 
regression models with varying dispersion, a version of R 2 based on likelihood ratio, 
given by: R 2 lr = 1 — (L nu u/Lf it ) 2 / n , wherein L nu u is the maximum likelihood achievable 
(saturated model) and Lf lt is the achieved by the model under investigation. 


3.1 Monte Carlo results 


The Monte Carlo experiments were carried out using using both fixed and varying 
dispersion beta regressions as data generating processes. All results are based on 10,000 
Monte Carlo replications. Table Q] contains numerical results for the fixed dispersion 
beta regression model as data generating processe, given by 

log ( ~r~ ) — Pi + 02 x t 2 + P 3 x t3 + P 4 X U + P 5 x t5i t = 1 , ... , 71 , 

V 1 - ht) 

The covariate values were independently obtained as random draws of the following 
distributions: X ti ~ 17(0,1), i = 2,... ,5 and were kept fixed throughout the experi¬ 
ment. The precisions, the sample sizes and the mean response are, respectively, 0 = 
(50,148,400), n = (40, 80,120), // G (0.005, 0.12), /j G (0.90, 0.99) and // G (0.20, 0.88). 
To investigate the performances of statistics in the omission of covariates, we consid¬ 
ered the Scenarios 1, 2 and 3, in which are omitted, three, two and one covariate, 
respectively. In the fourth scenario the estimated model is correctly specified. Addi¬ 
tionally we calculate the R\ r for the same scenarios. The results in Table [1] show that 
the values of all statistics increase as important covariates are included in the model. 
Statistics behave similarly as the sample size and the precisions values indicating that 
the most important factor is the correct specification of the model. Considering the 
three ranges for the /! it should be noted that the statistic values are considerably 
larger when /! G (0.20, 0.88) and the values approaching one when the estimated model 
is closest to the true model. For instance, in Scenario 4 for n = 40, 0 = (50,148,400) 
the values of P 2 and R 2 lr are, respectively, (0.8354, 0.9357, 0.9748) and (0.8349, 0.9376, 
0.9758). 
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The statistics finite sample behavior substantially change when /i G (0.90; 0.99). It is 
noteworthy the reduction of the statistic values, revealing the difficulty in to fit the 
model and make prediction when fi m 1. Indeed, in this range of // is more difficult to 
make prediction that to fit the model. For example, in Scenario 1, when three covariates 
are omitted from the model, when n = 40 and 0 = (50,148,400) the P 2 values equals, 
0.0580, 0.0636 and 0.0972 whereas the the R 2 lr values are 0.1553, 0.1999 and 0.2496, 
respectively. Similar results were obtained for n = 80,120. Even when for the correctly 
specified four covariate model (Scenario 4) the predictive power of the model is more 
affected than the quality of fit of the model by the fact of /i ~ 1. In this situation, it is 
noteworthy that the finite sample performances predictive power model improve when 
the value of the precision parameter increases. For instance, when n = 120 and 0 = 
(50,148,400) we have P 2 = (0.0272,0.2222,0.5622) and P| 7 = (0.063,0.5348,0.8381), 
respectively. Here it is possible see that the P| statistic always shows larger values than 
the P 2 statistic when the mean responses are close to of the upper limit of the standard 
unit interval. However, the two measures behave similarly when used to investigate 
model misspecification. 


The same difficulty in obtaining predictions and in fitting the regression model occurs 
when fi G (0.005,0.12). Once again the greatest difficulty lies on the predictive power 
of the model. It is also noteworthy that when /i 0 the point prediction becomes even 
less reliable than when /i ~ 1, since the P 2 and P| 7 values decreased substantially and 
become considerably distant from the R 2 lr values. When the mean responses are close 
to of the lower limit of the standard unit interval, the P 2 7 seems to be more able in 
identify poor predictions. For instance, in Scenario 4 (model correctly specified; four 
covariates) when n = 120 and 0 = (50,148,400), we have P 2 = (0.0464, 0.2716, 0.6322) 
and Pf 7 = (0.0362,0.0468,0.0603), respectively. 

We have also carried out Monte Carlo simulations using a varying dispersion beta re¬ 
gression model, in which we increased the number of covariates, used different covariates 
in the mean and precision submodels. In this case the data generating process and the 
postulated model is the same . We report results for A = (20, 50,100), n = (40, 80,120), 
[i G (0.20,0.88), ji G (0.90,0.99) and fi G (0.005,0.12). Here, 


A 


max {0t} 


min {0 t }’ 


(16) 


is the measure the intensity of nonconstant dispersion. The covariate values in the 
mean submodel and in the precision submodel were obtained as random draws from 
the 7/(0,1) and U{— 0.5, 0.5) distributions, respectively, such that the covariate values 
in the two submodels are not the same. At the end, we also considered a covariate 
values generated from 0 3 ) (Student’s t-distribution with 3 degrees of freedom). The 
results are presented in Table [2j We should emphasize that were generated only n = 40 
covariates values and the n — 80,120 covariates values are replications of original set. 
In this sense, the intensity of nonconstant dispersion remains the same over the sample 
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Table 1 

Statistic values. True model: g(f+ t ) = log(/j t /(l - fi t )) = fi\+ P 2 x t2 + P3 %t 3 + Pa x t A + Ph Xt5 , 
t = 1,n, p fixed. Misspecification: omitted covariates (Scenarios 1, 2 and 3). 



Scenarios 


Scenario 




Scenario 



Scenario 



Scenario 




Estimated 

a(+t) =01 + 02 x t 2 

a(+t) = 01+02 x t 2 

a(+t) =01+02 x t 2 

a(+t) 


= 01+02 x t 2 + 


model 





+03 x t3 


+/?3 x t3 + 04 

x t4 

03 *t3 

+ 0i XtA + 05 x t 5 

P 

/x £ (0.20, 0.88) 

n 

<i> 

50 

150 


400 

50 

150 

400 


50 

150 

400 

50 


150 


400 

40 

p2 


0.392 



0.457 

0.501 

0.518 



0.655 

EE! 

0.835 


0.935 




P M 

0.454 

0.471 


0.478 

0.567 

0.599 

0.611 



0.754 


0.856 


0.938 


0.974 


R 2 lr 

0.354 

0.390 


0.405 

0.467 

0.514 

0.532 


EH 

0.674 


0.857 


0.946 


0.979 

80 

p2 




EH 

0.439 

0.487 

0.505 



0.642 

0.668 

EH 


0.929 


0.972 


P P~r 

0.437 

0.457 



0.551 

0.587 

0.601 


0.689 

0.745 

0.768 



0.932 


0.971 


r2 lr 

0.351 

0.389 



0.462 

0.512 

0.531 


0.605 

0.671 

0.696 



0.942 


0.977 

120 

MB 




0.387 


EH 

0.501 



0.638 

HI 

0.813 


0.927 


mm 



0.431 

0.452 


0.460 



0.598 



0.742 




0.930 




m 

0.350 

0.389 


0.404 

0.460 


0.531 



0.670 




0.941 



P 

A* £ (0.90, 0.99) 

n 

4> 

50 

150 


400 

50 

150 

400 


50 

150 

400 

50 


150 


400 

40 


0.058 

0.063 


EH 

EH 

EH 



EH 


EH 

EH 


0.296 





0.092 

0.112 












0.601 


0.858 



0.155 

0.199 





0.364 



0.486 




0.619 


0.794 

80 

p2 

0.033 

EH 


0.072 


0.044 

EH 


0.035 

0.165 

0.385 



0.240 





0.067 

EH 


0.192 


0.115 



0.069 

0.404 

0.699 



0.551 


0.843 


R 2 lr 

0.149 

0.195 


0.246 

0.212 

0.283 



0.329 

0.471 

0.612 



0.597 


0.781 

120 

p2 

0.025 

0.028 


0.063 

0.030 

0.036 



0.025 


0.376 



0.222 




P M 

0.058 

0.069 


0.184 

0.072 

0.103 



0.057 

0.390 

0.694 



0.534 


0.838 


R 2 lr 

0.147 

0.194 


0.245 

0.207 

0.280 

EH 


0.322 

0.466 

0.609 

EH 


0.591 


0.777 

P 

M e (0.005, 0.12) 

n 

4> 

50 

150 


400 

50 

150 

400 


50 

150 

400 

50 


150 


400 

40 


0.067 

0.055 



0.072 

0.048 

0.070 


0.072 

0.144 

mi 

EH 


eh 


0.663 



0.044 

0.043 



0.049 

0.041 

0.035 


0.061 

0.067 

EH 

EH 


EH 


0.111 



0.214 

0.252 


0.294 

0.274 

0.327 

0.381 


0.378 

0.482 

0.576 

0.526 


0.700 


0.847 

80 

p2 

0.044 

0.031 


EH 

EH 

0.028 

EH 


EH 

0.113 

0.269 

EH 


0.271 




p py 

0.022 

0.021 


MM 


0.020 



EH 

0.037 

0.047 

EH 


0.046 


0.060 


r2 LR 

0.209 

0.249 


0.292 


0.320 



0.361 

0.470 

0.568 

0.504 


0.683 


0.838 

120 

p2 

0.037 

EH 


0.049 


0.022 

0.053 


0.037 

0.101 

0.262 

H 


0.252 


EH 


P M 

0.015 



0.015 


0.013 

0.011 


0.019 

0.027 

0.038 



0.032 


EH 


R 2 lr 

0.207 

EH 


0.291 

EH 

0.317 

0.375 


0.356 

0.465 

0.566 

EH 


0.677 


0.834 


size. 

When the mean responses are scattered on the standard unit interval (/+ G (0.20, 0.88)) 
the three statistics display similar values. It seems that neither the degree of intensity 
of nonconstant dispersion nor the simultaneous increase in the number of covariates in 
the two submodels noticeably affect the predictive power and fit of the model when the 
sample size is fixed. However, it is noteworthy a reduction of statistic values when the 
response values are close to one or close to zero, making clear the difficulty in fitting 
the regression model and obtaining good predictions when /i « 1 or /i « 1 and the 
precision is modelled. The minor values of Pp statistic reveals the problem in to make 
good predictions when p « 0, whereas when g « 1 this problem is singled out by 
smaller values of P 2 statistic. Here, the model fit is more affect when the number of 
covariates increases simultanealy in the two submodels. For instance consider n = 40, 
A = 100 and /x e (0.005,0.12). At the Scenario 5 (one covariate in both submodels), 

































































































































































we have P 2 = 0.8117, P | 7 = 0.3677 and R 2 lr = 0.8228. Whereas in Scenario 8 (four 
covariate in both submodels) we have P 2 = 0.8627, P | 7 = 0.4863 and R 2 lr = 0.6447; 

We also displayed in Table [2] the statistic values when the model is correctly spec¬ 
ified, but we introduced leverage points in the data. To that end, only the X 2 val¬ 
ues were obtained as random draws of the i( 3 ) distribution and concerned ourselves 
with /i G (0.20,0.88), which yielded one point which has leverage measure ten times 
greater than the average value when n = 40, two high leverage points when n = 80 
and thre e, n = 120. Here, we used as measure of leverage the leverage generalized 
( Fspinheira et ah. 20081 ). Notice that in Scenarios 5, 6 and 7 the P 2 measure seems 
more able to identify correctly that the leverage points affect the goodness of predic¬ 
tion than the P 2 7 measure. On the order hand, the P 2 7 outperforms the P 2 in Scenario 
8 . It is interesting to notice that in Scenario 5, which represents one covariate in both 
submodels, with the only one covariate of mean submodel had values generated from 
the 3 ) occurs the smaller values of the three statistics. Thus, the statistics correctly 
lead to the conclusion that as greatest is the influence of leverage point in the data, 
worst are the predictions and the model fit. 

Finally, were carried out Monte Carlo simulations to assess the performance of statis¬ 
tics when the dispersion modelling is neglected. To that end, the true data generating 
process considers varying dispersion but a fixed dispersion beta regression is estimated; 
see Table [3J In this case we have misspecification. Thus, we hope that the statistics 
display smaller values in comparison with Tabled In this sense, when /j, G (0.005, 0.12), 
it is noteworthy that the Pf 7 statistic outperforms the P 2 statistic identifying more 
emphatically the misspecification. For the other hand, when /r ~ 1 is the P 2 statistic 
that emphasizes the poor prediction power of the model when the varying dispersion 
is neglected. But, in fact, the statistics behavior slightly change. For instance, consider 
/i G (0.20, 0.88), n = 120, A = (20, 50,100) and Scenario 4 (three covariates in both sub¬ 
models). When the dispersion is correctly modelled; Tabled Pf 7 = (0.737, 0.740, 0.740) 
and when the varying dispersion is neglected; Table [3] P| = (0.696,0.680,0.651), re¬ 
spectively. 


4 Application 


In what follows we shall present an application based on real data. The application 
relates to the distribution of natural gas for home usage (e.g., in water heaters, ovens 
and stoves) in Sao Paulo, Brazil. Such a distribution is based on two factors: the 
simultaneity factor (P) and the total nominal power of appliances that use natural 
gas, computed power Q ma x■ Using these factors one obtains an indicator of gas release 
in a given tubulation section, namely: Q p = F x Qmax■ The simultaneity factor assumes 
values in (0,1), and can be interpreted as the probability of simultaneous appliances 
usage. Thus, based on P the company that supplies the gas decides how much gas to 
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Table 2 


Statistic values. Model correctly specified. g{nt) = l°g(/^t/(1 — /^t)) and h((j>t) = log 

t = 1 ,..., n. 



Scenarios 


Scenario 


Scenario 6 


Scenario 


Scenario 8 


Mean 

9(+t) = 01 + 02 a; t2 

s(+j) = 01 + 02 x t2 

s(+t) = 01+02 x t2 

sO t) 

= 01+02 x t2 + 


submodels 





+03 x t3 


+03 x t3 + 04 

x t4 

03 x t3 + 04 x t4 + 05 x t5 


Dispersion 

h(0t) = 71+72 2 t 2 

h(c/>t) = 71+72 2 t2 

flWt) =71+72 2 t2 

H<h) 

= 71 +72 2 t2 + 


submodels 





+73 2 t3 


+73 2 t3 + 74 z t4 

73 2 t3 + 74 2 t4 + 75 2 t5 

M 

M G (0.20, 0.88) 

n 

A 

20 

50 

100 

20 

50 

100 

20 

50 

100 

20 

50 

100 

40 

p2 

0.794 

0.764 

0.742 

0.743 

0.721 

0.699 

0.792 

0.769 

0.749 

0.731 

0.731 

0.725 



0.823 

0.812 

0.806 

0.772 

0.762 

0.755 

0.850 

0.843 

0.838 

0.819 

0.824 

0.826 


r Ir 

0.834 

0.837 

0.842 

0.771 

0.784 

0.797 

0.779 

0.779 

0.785 

0.712 

0.738 

0.761 

80 

p 2 

0.773 

0.739 

0.714 

0.702 

0.674 

0.649 

0.745 

0.715 

0.687 

0.646 

0.642 

0.630 


p l 7 

0.803 

0.789 

0.781 

0.732 

0.717 

0.708 

0.814 

0.802 

0.794 

0.758 

0.762 

0.763 


R 2 lr 

0.840 

0.844 

0.850 

0.783 

0.796 

0.810 

0.789 

0.790 

0.796 

0.724 

0.749 

0.772 

120 

p2 

0.766 

0.731 

0.704 

0.688 

0.657 

0.630 

0.729 

0.696 

0.665 

0.615 

0.609 

0.596 



0.796 

0.781 

0.771 

0.717 

0.701 

0.690 

0.801 

0.788 

0.778 

0.737 

0.740 

0.740 


r Ir 

0.842 

0.846 

0.852 

0.786 

0.799 

0.813 

0.793 

0.793 

0.799 

0.727 

0.753 

0.775 

M 

fj. G (0.90, 0.99) 

n 

A 

20 

50 

100 

20 

50 

100 

20 

50 

100 

20 

50 

100 

40 

p 2 

0.448 

0.569 

0.633 

0.527 

0.594 

0.650 

0.576 

0.671 

0.730 

0.789 

0.818 

0.841 


P | 7 

0.775 

0.870 

0.905 

0.829 

0.878 

0.909 

0.839 

0.901 

0.933 

0.950 

0.960 

0.968 


r2 LR 

0.455 

0.557 

0.617 

0.432 

0.494 

0.557 

0.353 

0.445 

0.513 

0.454 

0.501 

0.544 

80 

p 2 

0.410 

0.534 

0.599 

0.461 

0.534 

0.592 

0.482 

0.592 

0.661 

0.707 

0.743 

0.774 


P M 

0.770 

0.862 

0.898 

0.809 

0.861 

0.895 

0.804 

0.879 

0.915 

0.928 

0.942 

0.954 


r Ir 

0.491 

0.588 

0.644 

0.471 

0.530 

0.589 

0.399 

0.485 

0.551 

0.493 

0.536 

0.576 

120 

p2 

0.396 

0.522 

0.587 

0.436 

0.511 

0.571 

0.451 

0.566 

0.637 

0.678 

0.714 

0.750 


p l 7 

0.767 

0.859 

0.895 

0.801 

0.855 

0.890 

0.794 

0.872 

0.909 

0.921 

0.935 

0.948 


r2 LR 

0.501 

0.597 

0.653 

0.482 

0.541 

0.599 

0.412 

0.497 

0.563 

0.504 

0.544 

0.586 


M <E (0.005, 0.12) 

n 

A 

20 

50 

100 

20 

50 

100 

20 

50 

100 

20 

50 

100 

40 

p2 

0.680 

0.769 

0.811 

0.641 

0.692 

0.732 

0.647 

0.739 

0.797 

0.800 

0.832 

0.862 


p li 

0.218 

0.298 

0.367 

0.257 

0.296 

0.341 

0.281 

0.332 

0.387 

0.409 

0.442 

0.486 


r2 LR 

0.719 

0.781 

0.822 

0.609 

0.639 

0.677 

0.464 

0.547 

0.621 

0.532 

0.585 

0.644 

80 

p2 

0.657 

0.748 

0.792 

0.584 

0.639 

0.683 

0.567 

0.675 

0.742 

0.721 

0.763 

0.804 


P M 

0.166 

0.248 

0.321 

0.175 

0.216 

0.262 

0.169 

0.220 

0.278 

0.271 

0.305 

0.355 


r2 lr 

0.743 

0.754 

0.761 

0.639 

0.667 

0.704 

0.504 

0.585 

0.657 

0.566 

0.617 

0.676 

120 

p2 

0.650 

0.741 

0.784 

0.565 

0.619 

0.664 

0.540 

0.653 

0.722 

0.691 

0.737 

0.781 


p It 

0.150 

0.232 

0.306 

0.148 

0.189 

0.236 

0.132 

0.183 

0.242 

0.225 

0.260 

0.311 


r2 LR 

0.750 

0.760 

0.774 

0.649 

0.675 

0.711 

0.515 

0.596 

0.668 

0.577 

0.628 

0.689 


covariate values generated from t( 3 ) M G (0.20.0.88). 

n 

A 

20 

50 

100 

20 

50 

100 

20 

50 

100 

20 

50 

100 

40 

P 2 

0.426 

0.401 

0.385 

0.733 

0.705 

0.680 

0.624 

0.603 

0.585 

0.775 

0.773 

0.768 


P | 7 

0.526 

0.544 

0.565 

0.822 

0.812 

0.806 

0.685 

0.700 

0.716 

0.751 

0.762 

0.768 


r2 LR 

0.515 

0.555 

0.593 

0.756 

0.772 

0.787 

0.641 

0.671 

0.697 

0.741 

0.776 

0.800 

80 

p2 

0.400 

0.364 

0.340 

0.696 

0.658 

0.628 

0.553 

0.516 

0.490 

0.710 

0.701 

0.692 



0.633 

0.618 

0.613 

0.793 

0.779 

0.769 

0.750 

0.744 

0.740 

0.673 

0.680 

0.686 


r Ir 

0.537 

0.577 

0.616 

0.767 

0.782 

0.799 

0.657 

0.685 

0.711 

0.754 

0.789 

0.815 

120 

p2 

0.386 

0.348 

0.322 

0.682 

0.641 

0.608 

0.523 

0.482 

0.453 

0.687 

0.675 

0.663 


P | 7 

0.639 

0.622 

0.614 

0.783 

0.767 

0.756 

0.742 

0.732 

0.726 

0.644 

0.650 

0.655 


r2 LR 

0.545 

0.584 

0.623 

0.770 

0.785 

0.802 

0.661 

0.689 

0.715 

0.759 

0.793 

0.819 


supply to a given residential unit. 


The data were analysed by Zerbinatti (20081 ). obtained from the 
Tecnologicas (IPT) and the Companhia de Gas de Sao Paulo 


Instituto de Pesquisas 
(COMGAS). The re- 
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Table 3 

Statistic values. True models: g(fi t ) = log(^/(l - n t )) = f3\ + A x u , log(<fo) = 71 + 7 * z u , 
i = 2,3,4, 5, and t = 1,..., n. Misspecified models: 4> fixed . 



Scenarios 


Scenario 


Scenario 6 


Scenario 


Scenario 8 



sOt) = 01+02 x t2 

g(g-t) = 0 i +02 x t 2 

g(lH) = 01+02 x t 2 

g(g-t) = 0i + 02 x t 2 + 


True 





+03 x t3 


03 

x t3 + 04 x t4 

03 x t3 + /3 4 x t4 + /3 5 x t5 


models 

HM = 71+72 2t2 

h Wt) = 71+72 z t2 

^Wt) =71+72 x t2 

h(r/>t) = 71+72 z t2 + 







+73 z t3 


+73 z t3 + 74 z t4 

73 z t3 + 74 z t4 + 75 z t3 


Estimated 

g(llt) = 01 + 02 x t 2 

=01+02 x t2 

g(g-t) = 0 i +02 xt 2 

g(g-t) = 0i + 02 x t 2 + 


models 





+03 x t3 


+/3 3 x t 3 + (3 4 

x t4 

03 x t3 + 04 x t4 + 05 x t5 

M 

A* G (0.20, 0.88) 

n 

A 

20 

50 

100 

20 

50 

100 

20 

50 

100 

20 

50 

100 

40 

P 2 

0.778 

0.734 

0.699 

0.761 

0.721 

0.677 

0.707 

0.674 

0.639 

0.718 

0.707 

0.685 



0.792 

0.761 

0.740 

0.752 

0.717 

0.684 

0.775 

0.757 

0.735 

0.776 

0.768 

0.752 


R 2 lr 

0.777 

0.734 

0.701 

0.722 

0.676 

0.624 

0.607 

0.555 

0.505 

0.571 

0.549 

0.512 

80 

p2 

0.759 

0.711 

0.671 

0.728 

0.682 

0.630 

0.650 

0.608 

0.562 

0.643 

0.626 

0.594 


P M 

0.772 

0.738 

0.714 

0.714 

0.673 

0.631 

0.728 

0.707 

0.679 

0.717 

0.703 

0.680 


P Ir 

0.781 

0.739 

0.707 

0.732 

0.687 

0.637 

0.630 

0.582 

0.533 

0.600 

0.579 

0.544 

120 

p2 

0.753 

0.703 

0.660 

0.717 

0.669 

0.614 

0.631 

0.584 

0.534 

0.617 

0.598 

0.560 


p m 

0.764 

0.729 

0.702 

0.702 

0.656 

0.611 

0.712 

0.688 

0.660 

0.696 

0.680 

0.651 


R 2 lr 

0.783 

0.741 

0.708 

0.735 

0.690 

0.640 

0.637 

0.589 

0.541 

0.608 

0.588 

0.552 


M G (0.90, 0.99) 

n 

A 

20 

50 

100 

20 

50 

100 

20 

50 

100 

20 

50 

100 

40 

p2 

0.141 

0.163 

0.175 

0.194 

0.208 

0.220 

0.280 

0.292 

0.300 

0.347 

0.350 

0.351 


P M 

0.274 

0.349 

0.390 

0.314 

0.360 

0.395 

0.433 

0.460 

0.480 

0.560 

0.550 

0.545 


p Ir 

0.250 

0.244 

0.233 

0.177 

0.153 

0.134 

0.058 

0.039 

0.023 

0.086 

0.064 

0.041 

80 

p2 

0.093 

0.114 

0.127 

0.115 

0.127 

0.136 

0.165 

0.172 

0.176 

0.215 

0.211 

0.209 



0.242 

0.320 

0.364 

0.253 

0.296 

0.327 

0.332 

0.352 

0.368 

0.464 

0.442 

0.431 


R 2 lr 

0.275 

0.268 

0.257 

0.213 

0.191 

0.171 

0.115 

0.097 

0.082 

0.162 

0.139 

0.118 

120 

p2 

0.077 

0.098 

0.111 

0.089 

0.100 

0.109 

0.125 

0.130 

0.133 

0.170 

0.163 

0.159 


P £7 

0.231 

0.311 

0.356 

0.231 

0.274 

0.304 

0.295 

0.313 

0.325 

0.428 

0.400 

0.385 


R 2 lr 

0.282 

0.275 

0.265 

0.225 

0.203 

0.182 

0.131 

0.114 

0.098 

0.181 

0.157 

0.138 

M 

11 G (0.005, 0.12) 

n 

A 

20 

50 

100 

20 

50 

100 

20 

50 

100 

20 

50 

100 

40 

p2 

0.295 

0.312 

0.316 

0.270 

0.285 

0.295 

0.317 

0.331 

0.338 

0.371 

0.374 

0.377 


P l~i 

0.119 

0.123 

0.128 

0.168 

0.172 

0.179 

0.243 

0.252 

0.257 

0.278 

0.292 

0.301 


R 2 lr 

0.560 

0.522 

0.484 

0.400 

0.363 

0.326 

0.159 

0.143 

0.130 

0.165 

0.146 

0.124 

80 

p2 

0.272 

0.288 

0.290 

0.202 

0.217 

0.226 

0.205 

0.214 

0.218 

0.246 

0.239 

0.234 


P l~1 

0.068 

0.073 

0.077 

0.086 

0.090 

0.096 

0.127 

0.133 

0.137 

0.138 

0.149 

0.155 


R 2 lr 

0.603 

0.576 

0.545 

0.439 

0.422 

0.399 

0.216 

0.207 

0.201 

0.245 

0.223 

0.204 

120 

p2 

0.264 

0.280 

0.283 

0.177 

0.194 

0.203 

0.166 

0.171 

0.175 

0.202 

0.189 

0.180 



0.052 

0.058 

0.062 

0.059 

0.063 

0.069 

0.087 

0.092 

0.095 

0.093 

0.101 

0.105 


R 2 lr 

0.618 

0.596 

0.570 

0.449 

0.439 

0.427 

0.231 

0.222 

0.220 

0.266 

0.2410 

0.2520 


sponse variable (y) are the simultaneity factors of 42 valid measurements of sampled 
households, and the covariate is the computed power. The 1 simulta neity factors ranged 
from 0.02 to 0.46, being the median equals 0.07. Zerbinatti f2008h modeled such data 
and concluded that the best performing model was the beta regression model based on 
logit link and log of computed power used as covariate. However, the author sh ows t hat 
the beta regression model can underpredict the response. Thus, Espinheira ct al. (2014) 
argue that it is important to have at disposal prediction intervals that can be used with 
beta regressions. To that end, the authors built and evaluated bootstrap-based predic¬ 
tion intervals for the response for the class of beta regression models. They applied the 
approach to the data on simultaneity factor. However, a important step in this case was 
the selection of the model with the best predictive power. To reach this aim the authors 
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Table 4 

Statistic values from the candidate models. Data on simultaneity factor 



Candidate models 

Mean 

l°g(Mt/(l - iH)) = 

-l°g(-l°g(/u)) = 

l°g(Mt/(l - Mt)) = 

log( log (/it)) = 

submodel 

Pi + P2 Xt2 

Pi + P2 Xt2 

Pi + P2 X t 2 

Pi + P2 X t 2 

Dispersion 



log (</>t) = 

log (ih) = 

submodel 



71 + 72 Xt2 

71 + 72 Xt2 

p2 

0.423 

0.662 

0.461 

0.694 

% 

0.100 

0.100 

0.131 

0.203 

R lli 

0.701 

0.683 

0.701 

0.701 


used a simplified version of PRESS statist ic given by PRESS = (yt — V{t)) 2 
which selected the same model of IZerbinatti (20 0 81). Here we aim at selecting the better 
predictive model to the data on simultaneity factor using the P 2 and P| statistics. 
We also consider the R\ r as the measure of goodness-of-fit model. Since that the re¬ 
sponse is the simultaneity factor and the covariate X 2 is the log of computed power, 
we considered four candidate models. At the outset, we consider two beta regression 
model with fixed dispersion, the first one using logit link function for /j, and the second 
one using log-log link function. Then, in the following two models the dispersion is 
nonconstant, with the logit and log-log submodels for /j, and log submodels for (j). Then 
statistic values are presented in Table SJ Here, we consider that the predictive power 
of the model is better when the measures P 2 and P | 7 are close to one. 


The TablcQ] displays important informations. First, we notice that by the R 2 rv measures 
three models equally fits well. Second, since that the responses are close to of lower 
limit of the standard unit interval the statistics display small values, in special the Pp 
statistic. Third, the P 2 and Pj 7 measures lead to the same conclusions, selecting the 
beta regression model with link log-log for the mean submodel and link log for the 
dispersion submodel, as the best model to make prediction to the data on simultaneity 
factor. The maximum likelihood parameter estimates are f3\ = —0.63, (3 2 = —0.31, 
71 = 3.81 and 72 = 0.77. Furthermore, the estimative of intensity of nonconstant 
dispersion is A = 21.16 (see (16)), such that 0 max = 242.39 and <t > min = 11.45 . Selected 
among the candidates the best model in a predictive perspective, we still can use 
the PRESS statistic to identifying which observations are more difficult to predict. 
In this sense, we plot the individual components of PRESS and PRESS^ versus the 
observations index, Figure UK a) and [0(b), respectively. Overall, Figure |Tj shows that 
the cases 3, 11 , 16, 21, 31, 33 and 35 arise as the observations with more predictive 
difficulty and are worthy of further investigation. 


5 Conclusion 


In this paper we develop the P 2 and Pf 7 based on two versions of PRESS statistics that 
we proposed for the class of beta regression models. The P 2 coefficient consider the 
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Fig. 1. PRESS plots. PRESS (a) and PRESS^ (b). 


PRESS statistic based on ordinary residual from the Fisher’s scoring iterative algorithm 
for estimating (3 whereas Pp is based on a new residual which is a combination of 
ordinaries residuals from the Fisher’s scoring iterative algorithm for estimating f3 and 
7 . We have presented the results of Monte Carlo simulations carried out to evaluate the 
performance of predictive coefficients. Additionally, to access the goodness-of-ht model 
we used the R 2 lr . We consider different scenarios include misspecification of omitted 
covariates and negligence of varying dispersion, simultaneous increase in the number of 
covariates in the two submodels (mean and dispersion) and presence of leverage points 
in the data. Overall, the coefficients P 2 and P | 7 perform similar and both showed 
enable to identify when the model are not reliable or when is more difficult to make 
prediction. In this situations, the R 2 lr statistic also revels that the model does not fit 
well. It is noteworthy that when the response values are close to one or close to zero the 
power predictive of the model is substantially affected even under correct specification. 
Finally, an empirical application was performed. 
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A Appendix A: Fisher’s scoring iterative algorithm 


In what follows we shall present the score function and Fishe r’s information for f3 and 7 
in the class of varying dispersion beta regression models ( Ferrari et al.. 201 ll ). We shall also 
present results that are useful to the derivation of the residuals proposed in this paper. The 
log-likelihood function for model (HJ) is given by l(/3, 7 ) = J2t=i^t(yt,4>t), where £t{yt,4>t) = 
log rOt) - log T(y t 4>) - logr((l - Ht)4>t) + ilMpt - l) log yt + {{ 1 - yt)<k - 1} log(l - yt )• The 
score function for j3 is thus Up(/3, 7 ) = X T QT(y* — y*), X is an n x k, = diag(0i,... ,<p n ) 
and fth elements of y* and y* being given in flSJ), 


T = diag{l/ g'(yi ),..., l/g'(y n )}- (A.l) 

the score function for 7 can be written as I/ 7 (/3, 7 ) = Z T Ha, where, Z is an n x q, at being 
given in m and 

H = diag{l//;/(V/>i),..., 1 /ti{<t> n )}. (A.2) 

The components of Fisher’s information matrix are Kpp = X T <f>WX, = X T CTHZ 

and A ' 77 = Z T DZ. Here, W = diag{u>i,..., w n }; where 

wt = (f>tv t [l/{g'{yt)} 2 } and v t = {ip'{ytpt) H- (( 1 - Vt)Pt)} • (A.3) 

Also, C = diagjci,... ,c n }, with c t = 4> t {ip'{y t ^ t ) y t - ip'{{l - yt)(pt)iX ~ Vt)} and D = 
diagjdi,..., d n }, with 

dt = ^ {h’iyt)} 2 and Q = W^^Vt + V>'((i - vt)<Pt){ 1 - Vt) 2 ~ ip'(<Pt)} • (A- 4 ) 


The Fisher’s scoring iterative schemes used for estimating /3 and 7 can be written, respectively, 
as 

/3 (m+ i) = /3 (m) + and 7 ( m+1 ) = 7 H + (A.5) 

where m = 0 , 1 , 2 ,... are the iterations that are performed until convergence, which occurs 
when the distance between / 3 l m+1 ) and (5 becomes smaller than a given small constant. 

It is important to note that the beta density (pQ) belongs to canonical two-parameter exponen¬ 
tial family. Indeed, f {yt, Vt, Pt) = expjriTi + t 2 T 2 - A(r)}(l/y t (l - y t )), where r = (ri,r 2 ) = 
{VtPtAt), (Ti,T 2 ) = (log{y t /(l - y t )},log(l - Y t )) and A[t) = {-logr(</> t ) + \ogT{y t <p t ) + 
logr((l - Vt)4>t)}. Thus, 

E(Ti) = E(1?) = dA(r)/dn = 1 - y t )p t ) = y*, (A. 6 ) 


E(T 2 ) = E(log(l - Y t )) = dA(r)/dT 2 = ip({l - Vt)Pt) - ip{<pt), 
Var(Ti) = Var(Y)*) = d 2 A{T)/drf = ip'{/.Hpt) + ^'((1 - Vt)<Pt) = v t, 
Var(T 2 ) = Var(log(l - Y t )) = d 2 A{r)/dT 2 = ip' {{l - y t )pt) ~ ip'(<pt), 


and 


Cov(Ti,T 2 ) = d 2 A(t) /dTidr 2 = -ip' {{l - Vt)<Pt)- 


More details see lLehmann and Casella (19981 . p. 27). 


(A.7) 

(A.8) 

(A.9) 

(A.10) 
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